Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
نویسندگان
چکیده مقاله:
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracted is naturally imbalanced since chemical entities are fewer compared to other segments in text. In this paper, the class imbalance problem in the context of chemical named entity recognition has been studied and adopted version of random undersampling for NER data, has been leveraged to generate a pool of classifiers. In order to keep the classes’ distribution balanced within each sentence, the well-known random undersampling method is modified to a sentence based version where the random removal of samples takes place within each sentence instead of considering the dataset as a whole. Furthermore, to take the advantages of combination of a set of diverse predictors, an ensemble of classifiers trained with the set of different training data resulted by sentence-based undersampling, is created. The proposed approach is developed and tested using the ChemDNER corpus released by BioCreative IV. Results show that the proposed method improves the classification performance of the baseline classifiers mainly as a result of an increase in recall. Furthermore, the combination of high performing classifiers trained using undersampled train data surpasses the performance of all single best classifiers and the combination of classifiers using full data.
منابع مشابه
Named Entity Recognition through Classifier Combination
This paper presents a classifier-combination experimental framework for named entity recognition in which four diverse classifiers (robust linear classifier, maximum entropy, transformation-based learning, and hidden Markov model) are combined under different conditions. When no gazetteer or other additional training resources are used, the combined system attains a performance of 91.6F on the ...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملDutch Named Entity Recognition using Classifier Ensembles
Named Entity Recognition (NER) is the task of automatically identifying names within text and classifying them into categories, such as persons, locations and organizations. A variety of machine learning algorithms has been applied to the task, with research often aimed at feature selection and parameter optimization to improve a single classifier’s performance. However, finding the optimal fea...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملMemory-Based Named Entity Recognition
We apply a memory-based learner to the CoNLL-2002 shared task: language-independent named entity recognition. We use three additional techniques for improving the base performance of the learner: cascading, feature selection and system combination. The overall system is trained with two types of features: words and substrings of words which are relevant for this particular task. It is tested on...
متن کاملNamed Entity Recognition as a House of Cards: Classifier Stacking
This paper presents a classifier stacking-based approach to the named entity recognition task (NER henceforth). Transformation-based learning (Brill, 1995), Snow (sparse network of winnows (Muñoz et al., 1999)) and a forward-backward algorithm are stacked (the output of one classifier is passed as input to the next classifier), yielding considerable improvement in performance. In addition, in a...
متن کاملمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ذخیره در منابع من قبلا به منابع من ذحیره شده{@ msg_add @}
عنوان ژورنال
دوره 7 شماره 2
صفحات 311- 319
تاریخ انتشار 2019-04-01
با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023